Declarative Parallel Programming for GPUs
نویسندگان
چکیده
The recent rise in the popularity of Graphics Processing Units (GPUs) has been fueled by software frameworks, such as NVIDIA’s Compute Unified Device Architecture (CUDA) and Khronos Group’s OpenCL that make GPUs available for general purpose computing. However, CUDA and OpenCL are still lowlevel approaches that require users to handle details about data layout and movement across levels of memory hierarchy. We propose a declarative approach to coordinating computation and data movement between CPU and GPU, through a domain-specific language that we called Harlan. Not only does a declarative language obviate the need for the programmer to write low-level error-prone boilerplate code, by raising the abstraction of specifying GPU computation it also allows the compiler to optimize data movement and overlap between CPU and GPU computation. By focusing on the “what”, and not the “how”, of data layout, data movement, and computation scheduling, the language eliminates the sources of many programming errors related to correctness and performance.
منابع مشابه
Accelerating high-order WENO schemes using two heterogeneous GPUs
A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...
متن کاملDeclarative Programming Techniques for Many-Core Architectures
Future manycore architectures are likely to have heterogeneous computing resources which will include conventional CPUs as well as variants of today’s GPUs and reconfigurable logic like FPGAs. Many of the techniques that the reconfigurable computing community has championed will find new applications in mainstream applications. One challenge posed by such manycore architectures is the requireme...
متن کاملCnC-CUDA: Declarative Programming for GPUs
The computer industry is at a major inflection point in its hardware roadmap due to the end of a decades-long trend of exponentially increasing clock frequencies. Instead, future computer systems are expected to be built using homogeneous and heterogeneous many-core processors with 10’s to 100’s of cores per chip, and complex hardware designs to address the challenges of concurrency, energy eff...
متن کاملA GPU Implementation of Large Neighborhood Search for Solving Constraint Optimization Problems
Constraint programming has gained prominence as an effective and declarative paradigm for modeling and solving complex combinatorial problems. In particular, techniques based on local search have proved practical to solve real-world problems, providing a good compromise between optimality and efficiency. In spite of the natural presence of concurrency, there has been relatively limited effort t...
متن کاملHeterogeneous Programming with Single Operation Multiple Data
Heterogeneity is omnipresent in today’s commodity computational systems, which comprise at least one multi-core Central Processing Unit (CPU) and one Graphics Processing Unit (GPU). Nonetheless, all this computing power is not being harnessed in mainstream computing, as the programming of these systems entails many details of the underlying architecture and of its distinct execution models. Cur...
متن کامل